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Abstract 

. This paper gives a critical account of the minority game hterature. The minority 

game is a simple congestion game: players need to choose between two options, 
and those who have selected the option chosen by the minority win. The learning 
model proposed in this literature seems to differ markedly from the learning models 
commonly used in economics. We relate the learning model from the minority game 
qh| literature to standard game-theoretic learning models, and show that in fact it shares 

many features with these models. However, the predictions of the learning model 
^ , differ considerably from the predictions of most other learning models. We discuss 

CN ! the main predictions of the learning model proposed in the minority game literature, 

Ti^lj- , and compare these to experimental findings on congestion games. 



o 



^ ; JEL classification: C73, C90. 

! Keywords: Learning, congestion games, experiments. 

O 

> 

X 



*Address: Tilburg University, P.O. Box 90153, 5000 LE Tilburg, The Netherlands. E-mail: 
w.kets@uvt.nl. Tel: -^31-13-4662478. Fax: -1-31-13-4663280. I am indebted to Ginestra Bianconi, George 
Ehrhardt, Doyne Farmer, Matteo Marsili, Esteban Moro, Jan Potters, Dolf Talman, and Mark Voorneveld 
for inspiring discussions and helpful comments and suggestions. In add ition, I would like to thank Esteban 



Moro for his kind permission for reproducing some of the figures from iMord (|2003[ ). All remaining errors 
are of course my own. 



1 Introduction 



Congestion games are ubiquitous in economics. In a congestion game (iRosenthall . 



19731 ). players use several facilities from a common pool. The costs or benefits that a 



player derives from a facility depends on the number of users of that facility. A conges- 
tion game is therefore a natural game to model scarcity of co mmon resources. Examples 



of such systems include vehicular traffic (INagel et al. 



Huberman and Lukose. 



19971 ). and ecologies of foraging animals (IDeAnge 



19971 ). pac ket traffic in networks 
is and Gross. 



19921) . Similar coordination problems are encountered in market entry games (jSelten and Giithl . 
19M)- 

Congestion games are also interesting from a theoretical point of view. In congestion 
games, players need to coordinate to differentiate. This seems to be more difficult than 
coordinating on the same action, as any commonality of expectations is broken up. For 
instance, when commuters have to choose between two roads A and B and all believe that 
the others will choose road A, nobody will choose that road, invalidating beliefs. The 
sorting of players predicted in the pure-strategy Nash equilibria of such games violates 
the common belief that in symmetric games, all rational players will evaluate the situation 
ident ically, and hence, make the same choices in similar situations (see lHarsanyi and Seltenl . 



19881 . p. 73). Moreover, in congestion games, players may obtain asymmetric payoffs in 
equilibrium which may complicate attainment of equilibrium, as coo rdination cannot b e 
achieved through tacit coordination based on historical precedent (cf. iMeyer et al.l . Il992l ). 
Finally, congestion games often have many equilibria, so that players also face the difficulty 
of coordinating on the same equilibrium. 

Nevertheless, the theory of learning in games provides sharp predictions on players' 
behavior in congestion games. A s congestion games belong to the class of potential games 
(IMonderer and Shapleyl . Il996bl ). all results that have been derived for potential games 
apply to the class of congestion gamesE Experimental evidence, however, is not always in 
line with these predictions. Though several experimental studies have shown that players 
are remarkably successful at learning to coordinate in congestion gamesj^ regularities on 



()l996bl) . and 



^See e.g. iHofbauer and Hopkind (120051) . iHofbauer and Sandholml (j2002f ). iMonderer and Shaplev 



Sandholm (2001 



1200 



I 



Kets and Voorneveld 



20071 ) study the convergence of play under 



different learning processes in the minority game. 



entry games, which is accounted for on the aggregate level by the Nash equilibrium solution ( 


Kahneman. 


1988; 


RaDODort . 


1995; 


Sundali et al.. 


1995 


Erev and Raoooort , 


1998; 


RaooDort et al. 


1998, 


I2OO0I). See 



e.g. 



Mever et al.l ([1992) and lSelten et all (j2007l ) for similar results on related games. 
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the aggregate level generally conceal non-equilibrium behavior at the individual level. Even 
though aggregate play is close to the Nash equilibrium, individual players generally do not 
play equilibrium strategies|f| Moreover, providing players with more information does not 
always lead to better outcomes^ 

These experimental findings are hard to explain with standard learning models. This 
paper discusses the liter ature on theminority game, a simple congestion game based on the 
El Parol bar problem of lArthurl (119941 ). Players have to choose between two alternatives. 
Only those who have chosen the minority side get a positive payoff. The minority game 
literature proposes a learning model that is able to account for many of the experimental 
findings listed above. We relate this learning model to the standard learning models in 
economics, and compare its predictions to experimental results on congestion games. The 
contribution of the current paper is that it relates the literature on the minority game, 
which has been largely developed in physics, to the literature on learning in game theory 
and to the literature in experimental economics on congestion gameslfl 

The outline of this paper is as follows. In Section [21 we introduce the minority game 
and discuss its equilibria. The learning model proposed in the minority game literature 
is discussed in Section |3l In Section HI we discuss the main predictions from the learning 
model. These predictions are compared to experimental results on congestion games in 
Section [51 Section [61 concludes. 



2 The stage game 

The minority game is a game in which an odd number of players have to choose between 
two actions; for instance, players either go to a bar or stay home, either buy or sell an asset, 
etcetera. Players want to distinguish themselves from the crowd: their aim is to take a 
different action than the majority of players. 



(1200J) 



See e.g. iMever et all (|l992l ). lErev and RapoportI (|l99a i. 



Seltenetal 



pOOTn . iBottazzi and Devetag 



^For instance, in their experiments on market entry games. lErev and RapoportI ( 19981 ) find that pro- 
viding players with information on other players' actions may actually lead to lower average payoffs. 

'"^We have no intention of giving a comprehensive survey of the minority game literature, as an enormous 
amount of work on the minority game has been done. For an extensive collect i on of papers on the 



include 
and 



Challet et al 



minor ity game, se e http: //www.unif r . ch/econophysics/ininority/[ See iMord ( 20031 ): 
(|2004l ) or Coolen ( 2005) for an int r oduction to the field. P apers in economics on th e minority game 

mm- iBionskil J1999I ) 



Bottazzi and Devetad (120071 ) . IChmura and Pitd (|200J), and 



Renault et al 



Koiima and Takahashil (|2004l ) study learning in games very similar to the minority game. 
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Following the notation of Tercieux and Voorneveld (2005), we denote the set of players 
by A/" = {1, ... ,2k + 1}, with G N. Each player i ^ Af has a set of pure strategies 
Ai = { — 1,+1}: agents have to choose between two options. The set of mixed strategies of 
player i is denoted by A{Ai). We denote a mixed strategy profile by a G Xjg_^A(y4j), and 
we use the standard notation a^i G Xjg^\{j}A(Aj) to denote a strategy profile of players 
other than i G A/". With each action a G { — 1,+1}, a function 

fa : {l,...,2k + l} 

can be associated which indicates for each n G {1, . . . ,2k + 1} the payoffs to a player 
choosing a when the total number of players choosing a equals n. The von Neumann- 
Morgenstern utility function of a player is then given by 

Ui{a) = faA\{j ^ ^ ■■ aj = , (2-1) 

where a G Xj^j^Aj. Payoffs are extended to mixed strategies in the usual way. 

The function fa{-),CL G {—1,+!} can have several forms. It is commonly assumed that 
congestion is costly: 

[Mon] /_! and /+i are strictly decreasing functions, 

and that the congestion effect is the same across alternatives: 

[Sym] /_i = 

A commonly used f orm is f^i{n) = f+i{n) = 1 if n G {!,..., fc} and otherwise 



(IChallet and Zhang . 119971 ). Alternatively, one could define payoffs in terms of the ag- 
gregate action ^^^zj^cLi for a given action profile a = {ai)i^^, with G {—1, +1} for all i. 
Let g he a. function on M such that g{—x) = —g{x) for all x G M and g{x) > for x > 0. 
A player i G A/" is then assigned the payoff 

Ui{a) = -aig i^aj] . (2.2) 

VisAT / 

In our notation: 



f.,{n) = U,{n)=g{2{k-n) + l). 

Common choices include 

g{x) = x/{2k + l) (2.3) 
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and 



g[x) = sign(xj 



Most of the predictions of the learning model are not affected qua litatively by th e precise 
choice of payoff function, given that it satisfies [Mon] and [Syrn ] (ILi et all 120001 ) . Notice 
that the minori ty game is a congestion game (IRosenthall . 119731 ) and hence a finite exact 
potential game (IMonderer and Shapleyl . Il996bl ). 

To analyze the game's Nash equilibria, we introduce some more notation. A player who 
uses a mixed strategy that puts positive probability on both pure strategies is referred 
to as a mixer. A player that puts full probability mass on the alternative —1 is called a 
{—l)-player] similarly, a player that puts full probability mass on the alternative +1 is 
called a {+l)-player. 

The stage game has a large number of Nash equilibria. iTercieux and Voorneveldl (120051 ) 
show that a pure strategy profile is a Nash equilibrium if and only if one of the alternatives 
— 1 or +1 is chosen by exactly k of the 2k + 1 players. Note that these Nash equilibria are 
not strict, as a player that is in the majority is indifferent between sticking to his choice 
or switching actions, as his deviation would shift the majority. There are 2(^'^^^) of such 
as ymmetric pure-strateg y Nash equilibria. 



Kets and Voorneveldl (120071 ) characterize the game's mixed-strategy Nash equilibria. It 



can be shown that in any Nash equilibrium with at least one mixer, all mixers use the same 
mixed strategy. Moreover, player labels are irrelevant by [Sym] (if a is a Nash equilibrium, 
so is every permutation of a). Together, these facts imply that a Nash equilibrium with 
at least one mixer can be summarized by its type {i, r. A), where i,r E {0, 1, . . . , 2fc + 1} 
denote the number of players choosing pure strategy —1 or +1, respectively, and A G (0, 1) 
the probability with which the remaining z{i, r, A) := {2k + 1) — (£ + r) > mixers choose 
— 1. Let f_i(£, r. A) denote the expected payoff to a player choosing —1; r. A) is 

defined similarly. Then, a strategy profile of type {£, r, A) is a Nash equilibrium if and only 
if 

t;_i(£ + l,r,A) =t;+i(£,r + l,A), (2.4) 

i.e., the expected payoffs to a mixer of playing the pure strategy a = —1 are equal to 
the expected payoffs of the pure strategy a = +1. It can be shown that there exist Nash 
equilibria with exactly one mixer. These equilibria are of type {k, k, A) with arbitrary 
A G (0, 1), i.e., the mixer uses an arbitrary mixed strategy, whereas the remaining 2k players 
are spread evenly over the two pure strategies. In addition, there are Nash equilibria with 
more than one mixer. For i,r E {0,1, ... ,2k + 1} such that i + r <2k — l, there is a Nash 



5 



equilibrium of type (£, r, A) if and only if ma.x{i, r} < k. The corresponding probability 
A G (0, 1) solves (12. 4p . and can be shown to be unique. It follows from these results that 
there is a unique symmetric mixed-strategy Nash equilibrium. In this equilibrium, each 
player chooses each option with probability i. The expected number of players choosing 
each option is then k + ^. 



3 Learning in the minority game 



Players in the minority game face both a coordination problem and an incentive prob- 
lem. The coordination problem is not easy to solve. As the equilibria in pure strategies 
cannot be Pareto-ranked or ordered in terms of risk-dominai ice, no particula r pure-strategy 
Nash equilibrium can be singled out as being most salient (jSchellingl . Il960h . Hence, with- 



out pre-play communication, playe rs do not have enough inform ation to implement a 
pure-strategy Nash equilibrium (cf. iMenezes and Pitchfordl . |2006| ). While players could 



use common knowledge of rationalit y and symmetry to deduce and select the symmetric 
mixed-strategy Nash equilibrium (cf. 



Ochs 



199G 



Mever et al.l . Il992h . this may raise an in- 



centive problem, as players can earn a higher payoff than in the symmetric mixed-strategy 
Nash equilibrium if they manage to outsmart the other players. Hence, players may try 
to find patterns in the play of others when the game is played repeatedly (cf. lArthurl . 



1994 



Meyer et al.l . Il992l ). The learning model proposed in the minority game literature 



provides a way of formalizing this notion. In this section, we first introduce the model, 
and then discuss its assumptions, relating the learning model to other learning models in 
the literature. 



3.1 Model 

The stage game is played repeatedly. After each round of play t of the stage game, the 
players are informed of the aggregate action A{t) := Cii(t), where aj(t) G { — 1, +1} is 

the action taken by player i in round t. Furthermore, it is assumed that players only retain 
the sequence of the last m winning groups — sign[74(t)], where m G N. Hence, in round t, 
players observe the m most recent outcomes hm(t) = {—sign[A{T)])re{t-m,t-m+i,...,t-i}- 

A response mode s assigns to each information set hm G Hm = {{xk)k=i,...,m\xk G 
{ — 1,-|-1}} an action a G { — 1,-|-1}. That is, a response mode s prescribes which action 
s{hm{t)) G { — 1,+1} to take, for a given history of play hm{t) at time t. There are 2^™ 
different response modes: there are 2™ possible signals hm of length m, and for each signal. 
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History Action 



hm 


Si,l 


Si,2 


Si,3 


Si,4 


-1 


-1 


-1 


+1 


-1 


-1 


+ 1 


-1 


-1 


+ 1 


-1 


-1 


+ 1 


-1 


-1 


+ 1 


-1 


+1 


-1 


-1 


+ 1 


-1 


+ 1 


+ 1 


-1 


+ 1 


-1 


+ 1 


+ 1 


-1 


-1 


+1 


+ 1 


+ 1 


+ 1 


+ 1 


-1 


+ 1 


-1 


-1 


+ 1 


+ 1 


+ 1 


+ 1 


-1 


-1 


-1 


-1 


+ 1 


+ 1 


+ 1 


+ 1 


-1 


+ 1 


-1 


+ 1 



Table 1: An example of a subset of response modes with m = 3 and = 4 for some player 
ieAf. 

there are two possible actions. For memory length m, denote the set of all response modes 
by S^"^^ . An important assumption in the minority game learning model is that each player 
2 G A/" is endowed with a subset Si of all possible response modes, with for each i & M the 
response modes in Si drawn uniformly at random from S'^^^ , independently across players. 
Results are then obtained by averaging over all possible assignments of response modes. 
This endowment is fixed for each player, and all players are endowed with the same number 
ns >2 oi response modes. An example of such a subset of response modes for = 4 and 
m = 3 is given in Table [1] 

When faced with a history hm{t), an player has to choose which of his ns response 
modes to use in the next round. Each player i keeps a virtual score Pi/{t) for each response 
mode Si^i G Si that reflects that response mode's past performance. The virtual score of 
each response mode is updated after each round, regardless of whether the response mode 
has been used or not. When a response mode would have correctly predicted the winning 
side, its virtual score is increased with the payoffs it would have earned, otherwise it is 
decreased with the same amount. This means that players do not take the effect of their 
action on the aggregate outcome A{t) into account. In determining the virtual score of 
a response mode, players only consider whether this response mode would have predicted 
the actual outcome correctly, neglecting the question whether playing this response mode 
would have affected the outcome. 

Example 3.1. Suppose that the payoffs are of the form (12.31) . Then, the updating rule is: 

Pi^e{t + 1) = Pi,i{t) - Si^e{hm{t)) ■ 
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where i E {1, . . . , ns}. Suppose that in some round t, player i has chosen action ai(t) = —1, 
and that the total number of players choosing action a = — 1 is k + 1, i.e., —1 is the majority 
action. Then the virtual score of all response modes prescribing a = —1 would be decreased 
by (A; + 1) — = 1, while the virtual scores of all other response modes would be increased 
by 1. However, if player i would have played one of those response modes, the number of 
players choosing a = +1 would have been k + 1, and +1 would have been the majority 
action. < 

The probability that player i E Af chooses the response mode Si^i G Si in the next 
round is given by the well-known logit choice rule: 

Froh{si{t) = Si,,} = ^^^f,.p^^^^,y (3.1) 

The parameter (3 can be interpreted as the sensitivity of choice to marginal information. 
In the limiting case /5 oo, play becomes fully deterministic in the sense that players 
choose the response mode with the highest virtual score. Allowing f or (3 < oo adds noise a t 



the individual level as well as it introduces additional heterogeneity (jCavagna et al.l . ll999l ). 
When /3 — > oo, all players endowed with a certain response mode keep the same virtual 
score for that response mode. By contrast, for finite (3, players differ in their ranking of 
response modes, as their endowment of response modes determines the denominator of 
Equation (13. ip . Perhaps surprisingly, this added heterogeneity and noise actually improves 
collective performance, as discussed in Section |4?T1 

Actions, outcomes and performance are thus linked by a complex feedback system, as 
illustrated in Figure [3?T1 Players observe the recent outcomes, and choose a response mode 
with a probability depending on the number of virtual points that response mode collected, 
resulting in an action a G { — 1, +!}• The actions of all players determine the winning side 
through the minority rule; this information is then fed back to the players and adds to the 
sequence of outcomes. 



3.2 Discussion 

In this section, we discuss two of the most important assumptions of the learning model 
in the minority game model: the assumption that all players are endowed with a random 
subset of response modes and the assumption that players update the virtual scores of 
response modes not used, without taking into account the effect of that response mode 
on the game's outcome. Although the learning model of the minority game literature 
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m 




action 



minority rule 



Figur e 3.1: A schematic overview of the minority game learning model. Figure taken from 
Morol (bood). 



seems to depart markedly from the standard evolutionary and learning models used in 
economics, we argue here that in fact the learning model combines different aspects of 
several game-theoretic models to provide a realistic model of player behavior in congestion 
games. 



3.2.1 Response modes and heterogeneity 

In the learning model proposed in the minority game literature, players base thei r action 



Arthur 



on the recent p ast, trying to discern patterns in their opponents' behavior, as in 
( I1994J ). lArthun proposes that players condition their decision to go to a bar on attendance 
levels in the previous weeks. He employs the terms "predictor" or "hypothesis" rather 
than response mode: if the bar has been crowded for the last three weeks, I expect it to 
be crowded next week also. These mental models are mapped into actions: if I expect the 
bar to be crowded, I will not go. 

The response modes in the minority game learning model are a concise way of modelling 
this notion. An important question, however, is which response modes need to be included 
in the model. There are two possible avenues. Firstly, one could simply incorporate all 
possible response modes. However, if all possible response modes are included in the 
learning model, the strategy space becomes huge already for very simple games. Many 
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different response modes are conceiv able in a simp le game such as the minority game, as 
illustrated by the list of examples in lArthurl (119941 ) . 

A second possibility is to include only a selection of possible response modes. In that 
case, one could either make a selection based on behavioral assumptions, or let the subset 
of response modes be determined randomly. In the first case, a natural choice is to include 
response modes that reflect beliefs about other players' actions, based on recent outcomes . 



The first approach is com monly taken in the economics literature (e.g. lErev and Rapoport 



19981: 



Selten et al. 



20071 ). while the minority game learning model takes the second avenue. 



When players' response modes are drawn uniformly at random from the set of all possible 
response modes, there are no restrictions on the types of response modes that players use. 

At first sight, this may seem to be a weak point of the model, as response modes do not 
need to have a sensible interpretation in the minority game learning model. However, it 
can be shown that regardless which response modes players are endowed with, players will 
self-organize in to groups that use d i fferen t response modes in such a way that their actions 
cancel out (see 



Challet and Zhang! (Il998|); 



Hart et al. 



(1200 ll ): see also Section H73l) . Hence, 
the minority game learning model provides a possible explanation for the simultaneous 
evolution of behavioral rules (e.g. "switch roads if the road was crowded in the previous 
period" ) and their antagonists ( "stay at the same road if the road was crowded in the 



previous period") often observed in congestion game experiments (e.g. ISelten et al.l . 120071 ) 
through the structure of the game and players' heterogeneity. The strong point of the 
minority game learning model is exactly that no assumptions regarding response modes 
are needed. In games such as the minority game, whether a response mode is reasonable 
only depends on the response modes used by others]^ Conversely, any response mode, 
whether it has a sensible interpretation or not, will work if opponents use response modes 
that recommend them to take the opposite action (see Section 1^^ . 

Note that th e mino rity game differs in this respect from games such as the p-beauty 
contest f Keynes . 1936 . p. 156)E Both in the p-beauty contest and the minority game. 



'For instance, I Selten et al reports that some subjects use a "direct" response mode in his 

experiments on route-choice games, while other subjects use a "contrarian" response mode. Subject who 
use the former response mode will switch roads if they experienced congestion in the last period, while 
subjects using the contrarian response mode stick with their choice, as they expect other subjects to switch. 
The important point to note is that the direct response mode is only sensible if there are players who use 
the contrarian response mode and vice versa. 

^In the p-beauty contest, players have to choose a number in a certain interval. Players have to guess 
what the average choice will be; the player that picks the number that is closest to some fraction cp < 1 
of the average choice will win. Suppose players have to choose a number between and 100, and will win 



10 



players base their actions on their behefs about other players' actions, who in turn base 
their actions on ... , etcetera. While in the p-beauty contest, this recursion of actions and 
beliefs ends at a well-defined limit point, the Nash equilibrium action, there is no such 
limit point in the minority game. This means that there is no action in the minority game 
that is optimal a priori, as in the p-beauty contest: if all think that a = —1 will be the 
m,„onty choice, the,, a., will choose that aefoj In such gnosticism on the type 

of response modes that players use may well provide a more realistic model of players' 
reasoning processes than the more restrictive assumptions employe d in different learning 
models. This offers an elegant solution to the dilemma signalled by lErev and RothI (1l998l . 
p. 873) that it is virtually impossible to include all possible behavioral rules, but that 
selection of specific rules bears the risk of "parameter fitting in a model with an enormous 
number of parameters" . In the minority game learning model, no response mode is ruled 
out on a priori grounds, while sensible behavioral rules evolve naturally, as the only criterion 
for a behavioral rule to be sensible in the minority game is that there are other players 
who follow a "contrarian" behavioral rule. 

However, this approach raises some questions. Firstly, one may ask why it is assumed 
that players are heterogeneous in their endowment of response modes. Perhaps more im- 
portantly, one could ask why players only consider a fixed number ns of response modes. 
Indeed, individual players have an incentive to increase the number of response modes they 
use, as that gives them an advantage over other players (IMarsili et al.l . 120001 ). However, 
these assumptions are not uncommon in game-theoretic models of learning and bounded 
rationalityl^ Possible justifications for such assumptions include that each player has differ- 
ent experiences prior to playing the min ority garae and therefore deems different resp onse 
modes more reasonable than others (cf. 



Aumann . 



199 



Fudenberg and Levind . 



19981 . p.4. 



and references therein), and that boundedly rational player may prefer to just consider a 



subset of response mod es that have worked we 
2^'" response modes (cf. lEUison and Fudenbergl . 



1 in t he past, rather than considering all 

iggJH 



with their choice is closest to (/3 = 2/3 of the average choice. Then nobody will choose a number higher 
than I • 100, so nobody should pick a number higher than | ■ | • 100, and so on. When players are rational 

and there is COTmnon Vnnwledgp nf ratinnality, the eqnilibriuTn choice is 0. 



*Also see 



Camerer and Fehij (|2006f) . ICamerer and Fehi explain behavior in congestion games and t he 



p-beauty contest using the cognitive hierarchy appmarh |caTnerpr et, a 



1999; 



Stahl and Wilson. 



iflIlUl995|) 

19951), it 



IS 



For instance, in cognitive hierarchy models (jCamerer and Ho 
assumed that each player is of some exogenously specified t ype; players of different types use different 
strategies. Another example is the replicator dynamic (e.g. IWeibulll . Il995f) in which players are "pro- 
grammed" to play a given strategy. 

^°The precise value of ns is irrelevant. The qualitative behavior of the model is not affected by the 



11 



3.2.2 The law of simulated effect and boundedly rational players 



Which response mode players choose from the set of response modes they are endowed 
with, is determined by the virtual score of each response mode. The learning process 
proposed in the minority game liter ature is closely related to the reinforcement learning 
model of iRoth and Erevi (119951 ) and lErev and RothI f|l998 ). The main difference between 



the basic reinforcement learning model of iRoth and Erevi and the learning model of the 
minority game literature lies in the updating of the score of strategies or response modes 
not played. In the basic reinforcement learning model, the scores of these strategies are not 
updated, while in the minority game learning model, the scores of all response modes are 
upda ted in every period, as i n hyp othetical reinforcement learning or stochastic fictitious 
play ( jpudenberg and Levine . 1998 ). The assumption that players also consider the payoffs 
to strategies or response modes not played seems to be reasonable. ICamerer and Hoi (119991 ) 
argue on the basis of theoretical arguments as well as on the basis of experimental results 
that players obey not only the "law of actual effect", but also the "law of simulated effect", 
meaning that in reinforcement, not only payoffs from strategies that are actually used count, 
but also foregone payoffs from strategies not played. 

However, for players to play according to the act of simulated effect, they need more 
information than for standard reinforcement learning^ In general, to play according to 
fictitious play, players need to know the payoff rule as well as the actions of their opponents 
in addition to their own payoff. Even in a game such as the minority game, where the 
players only need to know the aggregate choice of other players (and not their individual 
choices), calculating foregone payoffs of strategies not used may be too hard for players that 
are boundedly rational. In the minority game learning model, players' bounded rationality 
is reconciled with the law of simulated effect by assuming that players do not take the 
effect of their own action on the global outcome into account. In that way, players can 
account for foregone payoffs of response modes not used, without having to do complicated 
calculations. 

At first sight, one may think that for a large number of players, it does not matter 
whether players account for their own impact. However, due to the minority rule, there 
remains a systematic bias in the rewarding of response modes, even if the number of players 
goes to infinity. The reason is that the virtual score of a response mode that is currently 
played is systematically lower than that of the response modes that are not used. These 



choice of ns, as long as there is some heterogeneity among players (jChallet et al.l 120041 ). 

Recall that pl ayers only need t o kno w their own payoff to play according to the standard reinforcement 

learning model of iRoth and Erevi (|l995l ). 
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latter response modes get a point if they prescribe the current minority side, even if they 
would have tipped the minority to the other side if they would have been played, so that 
they would have guessed wrong in reality (cf. Example 13. ip . As the response mode that is 
actually played does not have this advantage, the response modes that are not played are 



systematically favored and hence results depend on whether p 



actio n on the aggregate outcome into account (IMarsili et al. 



a yers take the effect of their 



2000 



Marsili and Challet 



2OO1I). 



The minority game learning model thus combines features from several learning models in 
the literature on learning in games. However, the minority game learning model makes 
distinctly different predictions than game-theoretical learning models. To these predictions 
we now turn. 



4 Predictions of the learning model 

In this section, we discuss the main predictions on the minority game learning model. In 
the first two sections, we characterize the behavior of the model in terms of social efficiency 
and informational efficiency, and show that the two are inherently linked in the minority 
game learning model. In Section 14.31 we show how different response modes may evolve, 
and discuss the implications for efficiency. 



4.1 Volatility and attendance 



Typically, dust never settles down in the minority game learning model: the aggregate 
attendance A{t) := XlieAT '^«('^) a function of round number t keeps fluctuating, as can 
be seen in Figure 14. 1[ As the game is symmetric, the time average of A(t) wi 
in the steady s tate, as borne out by simu lations (see e.g. IChallet and Zhang) . 



Johnson et al.. 


, 

1998; 


Manuca et al.. 


2000) 



1997 



1 be 



1998 



2OOOI ). More interesting is the behavior of the variance 



cr^ := {A"^), where (■) denotes the (time) average of a quantity. The variance, or volatility, 
is a measure of the degree of efficiency achieved in a population. The higher the variance, 
the larger the aggregate welfare loss: large fluctuations around the time average (A) = 
imply that the size of the minority is only small. When payoffs are linear in A{t), this is 
easy to see: in that case, total payoffs are proportional to — ^i^j^ CLiA{t) = — (A(t))^. 

It has been found that is only a function of a := 2™/(2/c + 1) for a gi ven value of 
Us, w here we recall that is the number of response modes of each player ( Savit et"al] . 



19991 ). Figure 112] shows the volatility as a function of a. As can be seen in the figure. 
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Figure 4.1: Time evolution of the attendance A{t) with 5f[y4(t)] = A{t), 2k + 1 = 301 and 
n.q = 2. Panels correspond to m = 2, 7, 15 from top to bottom. Figure taken from 



Moro 



fl2003h . 



the volatility converges to the volatility exhibited in the symmetric mixed-strategy Nash 
equilibrium for a ^ oo. With a large number of players {a small), overall performance is 
much worse; in fact, the volatility is of the order of (2A; + 1)^, so that the size of the winning 
group is much smaller than k. At int ermediate va. 
a minimum at adns) = ns/2 — 0.6Q (IMarsili et al 



ues o: 



a, volatility is low, and it attains 
2000). Hence, at intermediate values of 



a, players are able to coordinate their actions and perform better collectively than under 
the symmetric mixed-strategy Nash equilibrium. This means that players can exploit the 
available information to predict future market movements so that the aggregate welfare loss 
0"^ is reduced relative to the symmetric mixed-strategy Nash equilibrium. Note that this 
is not the result of some form of cooperative behavior of the players: agents are selfishly 
maximizing their own return, and in doing that, they come closer to global efficiency. 

However, coordination is not complete under the current learning model. In the socially 
efficient outcome, players would play according to one of the pure-strategy Nash equilibria 
of the game, and the minority would consist of k players. In that case, almost half of 
the players are in the minority, and a"^ /{2k -|- 1) = 1/(2A; -|- 1). Players come close to this 
optimum at a = Oc, although they never reach it. For smaller values of a, performance 
is much worse than under this optimum, while for large values of a, aggregate payoffs 
are close to those of the symmetric mixed-strategy Nash equilibria (see Figure W?2\\ . By 
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Figure 4.2: Volatility as a function of the order parameter a for ns = 2 and different 
number of players N :=2k + l = 101,201,301,501,701 (□, 0, A, <, V, respectively). The 
critical value ac is the value of a for which the volatili ty is at a mi nimum. Inset: Agent's 
mean success rate as function of a. Figure taken from 



Morol fl2003h 



contrast, when players do take the effect of their own action on the aggregate outcome into 
account, play converges t o one of the pure-strategy Nash equilibria of the game so that 
coordination is complete ( 



2001 



De Martino and Marsih 



Challet et a 



2000b 



Marsili et al 



2000 



Marsili and Challet 



Strikingly, global efficiency is enhanced for certain values of a when players do not 
always choose the response mode s with the highest number of virtual points, i.e. when 
/3 < oo in Equation (13.11) . It can be shown that for a < (the socially inefficient regime). 



volatihty decreases whe n the noise leve. 



the lev e l of v olatility (ICavagna et al 



increases. For a > g ^ , the va l ue of (5 does no t affect 



1999 



Challet et al 



2000a 



Bottazzi et al 



2001 



Marsilil . l2004j ) . This result is not so surprising, however, if one recalls that in the minority 
game learning model, rational players herd in the socially inefficient regime {a < ac). 
When a < ac, there are few response modes relative to the number of players. In that 
case, players have to crowd at a limited number of response modes, leading to a large 
number of players choosing the same alternative (see Section 14. 3p . Setting /5 < cxd is 
equivalent to slowing down the updating of virtual scores for response modes more slowly. 



^^Also, iKets and Voorneveldl (|2007l ) show that most standard learning processes such as the rephcator 
dynamic converge to the pure-strategy Nash equihbria of the game. 
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A finite j3 therefore acts as a brake against overreaction (IBottazzi et al. 



20011 



1)0 



To summarize, the minority game learning model is characterized by competition and 
coordination. Agents compete in trying to exploit asymmetries in the games outcome, but 
at the same time, they try to reduce volatility, as volatility harms all players. Hence, there 
is a tension between competition and coordination. These two are intimately linked in the 
minority game learning model, as are information and efficiency. We discuss these issues 
in more detail in the next section. 



4.2 Information and efficiency 



As discussed in the previous section, players seem to be able to coordinate reasonably 
well for some parameter configurations. The only way players can interact is through 
the virtual s cores of their response ra odes, implying that there is some information in 
these values (jChallet and Zhangl . Il998l ). This observation led some authors to study the 



information contained in the history of play. The infor mation content of t 



l e hist ory of 



play, or the degree of predictability can be measured by (iChallet and Marsili 

H := ^Y.^A{t + l)\h^{t) = u)\ 



19991 ) 



where the time average of A{t + 1) is conditioned to the requirement that the last m 
winning groups are given by hm{t). U A{t + 1) and hm{t) are independent, then H = 0. 
Loosely speaking, H measures the information in the time series of A{t). li H > 0, then 
the signal A{t) contains information. It can be show n that players in the minority game 
learning model minimize the degree of predictability (jChallet et al.l . 12004 ). Depending on 
the value of a, they are more or less successful in doing that. At ac, the system changes 
from an informationally efficient and socially inefficient phase {H = 0, cr^ large) to an 
information-rich and socially efficient phase {H > 0, cr^ small). In the informationally 
efficient phase, players do worse than players playing according to the symmetric mixed- 
strategy Nash-equilibrium. By contrast, in the information rich phase, players manage to 
coordinate and do better than players who play according to the symmetric mixed-strategy 
Nash equilibrium. 



At a 



the symmetry between the two actions is broken. In the so-called symmetric 



phase {a < ac), both actions are equivalent. Both actions are taken by the players with 



^This result is reminiscent of the findings of 



Goeree et al 



(j2004r ) who show that payoff-dependent noise 
in the decision process is able to break the cascades that would result otherwise in a social learning model. 



16 



0.1 



1 

a 



10 



Figure 4.3: Information H (open symbols) and fraction of frozen players 
as a function of the control parameter a = 2"*/ {2k + 1) f or ns = 2 an d m 
squares and diamonds, respectively). Figure taken from iMord (120031 ). 



(full symbols) 
5, 6, 7 (circles, 



equal frequency. For a > a^-, one of the actions is preferred, i.e. the outcome is asymmetric. 
An asymmetry in the game's outcome represents an opportunity that could in principle 
be exploited. Hence, this is just a concomitant feature of the presence or absence of 
information in the history of play. 



As an alternative to if, one could also consider the fraction of frozen players (jChallet and Marsilil . 



19991 ). Frozen players are players who never change their response mode in the stationary 
state (in the limit of /3 oo in Equation (13. ip ). That is, these players have one response 
mode that outperforms all otherso As can be seen from Figure 14.31 the fraction of 
frozen players is zero in the informationally efficient phase, while it first rises for interme- 
diate values of a and then falls again when a goes to infinity. The intuition is that, for very 
small values of a (the informationally efficient phase), both actions are equivalent, so that 
there is little variation in the virtual scores of the different response modes. This means 
that players switch response modes easily. For very large values of a, players behave more 
or less randomly, so they switch response modes frequently. Only at intermediate values 
of a the fraction of frozen players is large, as many players have a response mode that is 
superior to other response modes. Note that the success of a response mode depends on 



^"^Note that this docs not imply that these players take the same action always: a response mode is a 
function of past play, hence the actions vary with the history hm- 
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the response modes used by opponents: a response mo de per se is not su perior, it is the 



SavitetaL 



19991 ). In particular, 



collective of response modes that is successful (see also 
it only pays to be predictable if others are predictable as well. 

This transition between the informationally efficient and the information rich phase, or 
equivalently between the socially inefficient and the socially efficient phase, is central to the 
minority game learning model. At this transition, there is a qualitative change in collective 
behavior, while the principles behind the behavior of individuals remain unchanged. For 
all values of a, players in the minority game learning model try to outsmart each other, 
but for low values of a, they are on average less successful. In the next section, we discuss 
the interpretation of a. 



4.3 Response modes and their antagonists 



The former sections have shown that the qualitative behavior of the system depends 
only on a = 2™/(2A; + 1), not on other variables such as ns- Moreover, for some values 
of this parameter, players are much more successful in coordinating behavior than for 
other values. What is the feature of the model underlying this behavior? We address 
this question in the current section. The answer to this question points to an intuitive 
interpretation of the model's results in terms of response modes and their antagonists. 

The minority rule forces players to differentiate: if all players choose the same response 
mode, all will loose. Agents want to be as far apart in the space of response modes as 
possible. However, there are only 2^™ possible response modes for 2k + 1 players. Hence, 
one would expect that players succeed in differentiating if 2A; + 1 ^ 2^" , while they behave 
more like a crowd when 2k + 1 > 2^™. So, one would expect a qualitative change at 
2fc + 1 ~ 2^™, rather than at 2/c + 1 ~ 2'", as observed. The reason that the transition 
occurs at 2/;; + 1 ~ 2™ rather than at 2/c + 1 ~ 2^™ is that two response modes s, s' only 
give rise to distinctively different behavior if either they prescribe different actions for every 
history of play (i.e., s and s' are anti-correlated) or if thei r pred ictions are uncorrelated 
( IChallet and Zhang . Il998l : iJohnson et al.l . Il998l ; iHart et al.l . l200ll ). It can be shown that 
for every response mode s, the number of response modes that are anti-correl ated or 



1998 



Hart et al. 



200l|). 



uncorrelated with s is given by 2 ■ 2'^/ns ( IChallet and Zhang . 
Hence, a is proportional to the inverse of this number. 

This leads us to an intuitive interpretation of the model's results in terms of the interplay 
between different response modes. Let s be a response mode, and let s be the response 
mode that is anti-correlated with s. Suppose Ns players use the response mode s at a given 
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time step, and Ng players use the anti-correlated response mode s at the same time step. 
If Ng ^ Ng for all anti-correlated pairs (s, s) of response modes, then the actions of players 
using these response modes effectively cancel and the volatility will be small. 

Hence, it would be optimal if the group of players that use a certain response mode is 
of about the same size as the group that uses the "antagonistic" response mode. However, 
this is not always possible, as the dimension of the space of response modes is fixed by the 
parameter m. Hence, players can only be "far apart" in terms of response modes if the 
number of players is not too large relative to the dimension of the response mode space. 
For a given number of players, players cannot differentiate if m is small, as the space of 
response modes is too crowded in that case. The players display herding behavior: for a 
pair of anti-correlated response modes (s, s), almost all players herd at one of them, with 
very few players choosing the other. Hence, the actions of the players choosing a given 
response mode do not cancel those of the players using its antagonist, so that will be 
large. For somewhat larger m (for a fixed number of players), players can differentiate, and 
the actions of players effectively cancel. Hence, the system is quite successful collectively 
at intermediate values of a, although the minority rule prevents the system from attaining 
full efficiency, i.e., not all players can be on the minority side. For a given h^, the response 
modes of most players are uncorrelated, but a small share of players uses response modes 
that are mutually anti-correlated. This coordinated avoidance is benefic ial for e veryb ody, 
as it helps to get a more even division of players over both alternatives (jZhangi . Il998l ). 

Now, for very large m at a fixed number of players, the number of players using a given 
respo nse mode will only be small, so that players act more or less independently (IMord . 



20031 ). However, the system still performs better than players who play the symmetric 



mixed-strategy Nash equilibrium would, as there always exists pairs of players that follow 
anti-correlated response modes, so that the players' actions are never truly independent and 
(T^/ {2k + 1) is sma ller than 1, the value of cr^ / (2k + 1) under the symmetric mixed-strategy 
Nash equilibrium (jchallet and Zhang . 1998 ). 



5 Comparison to experimental results 

In this section, we discuss some experiments on the minority game and related con- 
gestion games. In addition to the minority game, we focus on market entry games and 
route-choice games. First, we briefly introduce these two classes of games. We then present 
some experimental results, and discuss whether and how the learning model proposed in 
the minority game literature could explain these results. 
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The market entry game (jSelten and Giithl . Il982l ) has been studied extensively in eco- 
nomics!^ In a market entry game, N E N players must decide independently and si- 
multaneously to enter a market with a fixed capacity c < or to stay out. Players who 
enter the market receive a payoff that decreases in the number of entrants. The payoff of 
players who stay out of the market is commonly taken to be constant. The game generally 
has a large number of Nash equilibria, both in pure and in mixed strategies. Depending 
on the exact form of the payoff function, there may even be a continuum of equilibria. 
Pure-strategy Nash equilibria may be payoff-symmetric or payoff-asymmetric, and strict 
or non-strict, depending on the choice of parameters. For the payoff functions commonly 
studied, the number of entrants is between c — 1 and c in equilibrium (lErev and Rapoportl . 



1998 



Duffy and Hopkinj . 120051 ). An important difference between the market entry game 
and the minority game is that in the latter game, congestion effects are symmetric, while 
in the former game, players can choose between a safe option with guaranteed payoffs - 
staying out - and entering, the payoffs of which depends on the number of other players 
that enter. 

As the market entry game is a congestion g ame, the fictitious play process c onverges in 



beliefs to one of the Nash equilibria of the game (IMonderer and Shapley 



1996a). 



Duffy and Hopkins 



( I2OO5I ) show that the evolutionary replicator dynamic converge to one of its rest points, 

l e dyri amic. 



and that the mixed-strategy Nash equilibria of the game are u nstable under t 



They also show that under standard reinforcement learning (iRoth and Erevl . Il995l ). the 
learning process converges with probability one to one of the pure-strategy Nash equilibria 
of the game (when c ^ N). Under hypothetical reinforcement learning, where also the 
propensities of strategies not used are updated, the learning process converges with prob- 
abihty one to one of the (logit) pertu rbed equilibria corresponding to the pure-strategy 
Nash equilibria of the game for c ^ N (IDuffy and Hopkind . 120051 ) 

Route-choice games are closer to the minority game in that there is no safe option. In a 
route-choice game, players choose between two or more roads. The payoffs of choosing one 
of these roads falls in the number of other players who have chosen that road. Roads may 
differ in terms of capacity. In equilibrium, players divide themselves over the roads in such 
a way that traveling times and hence payoffs are equahzed. These games have been studied 



^^See e.g. 


Duffv and HookinE 


(2005). 


Kahneman 


( 


1988) 


Erev and RaooDort ( 


1998 




RaDODort et al. 


(1998. 


2000). 



Sundali et al 



( 19951) 



^^The perturbed equilibria are the logit quantal response equilibria (|McKelvev and Palfrevll 19951 ) of the 
game. 
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experimentally by a number of authors0 An important difference with the minority game 
is that the pure-strategy Nash equilibria of the route-choice game are payoff-symmetric. 
Moreover, these Nash equilibria are strict, unlike in the minority game. It is easy to see 
that the fictitious play p r ocess c onverges in beliefs to one of the Nash equilibria of the game 
( iMonderer and Shapleyl . Il996a( ). No other analytic results are available on the behavior 
of different learning processes in this type of games; however, given the similarities with 
market entry games and the minority game, we may expect learning processes to behave 
similarly in these games. 



The minority game has been discussed in detail in Section El 



Kets and Voorneveld 



( 120071 ) study the predictions of different learning models for the minority game. They show 
that the collection Nash equilibria with at most one mixer is asymptotically stable under 
the multi-population replicator dynar nic, while othe r stationary states of the replicator 
dynamic are not Lyapunov stable (e.g. IWeibulll . Il995l ). Finally, as in all congestion games. 



the fictitious play process converges in beliefs to one of the Nash equilibria of the game. 

We now discuss some experimental results on market entry games, route-choice games and 
the minority game, and whether, and how, these results can be explained by the minority 
game learning model. A robust finding in experiments on these games is that subjects 
quickly achieve a "magical" degree of coordination. Howe ver, individual playe r s gen erally 
do not play equilibrium strategies. For instance, while lErev and RapoportI (119981 ) find 
that the number of entrants in a market entry game rapidly converges to the equilibrium 
value, they also observe large between- and within-subject variability, which does not di- 
minish with experience^ This is a common finding in experiments on market entry games 
J 



Ochs. 



19991 . p. 169) o Similarly, in experiments on a route-choice game. 



Selten et al. 



( 120071 ) observe that the mean number of drivers on the different roads is very close to the 



equilibrium number, while large fluctuations persist until the en d of the session. Similar 
experimental results have been reported for the minority g ame ( IChmura and Pitzl . 



2004 



Platkowski and Ramsza . 



2003; 



Bottazzi and Devetag) . 120071 ). In all cases, the hypothesis 



that fluctuations can be explained by a symmetric mixed-strategy Nash strategy equi- 
librium of the game can be rejected. These results cannot be explained with standard 
learning or evolutionary models, as these models typically predict convergence to the pure- 
strategy Nash equilibria of such games (IDuffy and Hopkins! . l2005l : iKets and Voorneveldl . 



IidaeLami«JlMJ2msL£ 



a.ll pOOm . and 



Selten et al 



^^See e.g. 

^^An exception is lDuffv and Hopkins! (|2005( ) who find that subjects coordinate on one of the pure Nash 



( 20071) . 



equilibria of the market entry game after a large number of rounds when they are given feedback on others' 
choices. 
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20071 ). However, as discussed in Section the minority game learning model predicts 



precisely that average behavior will be close to the equilibrium prediction, while fluctua- 
tions will persist. 

Some authors attempt to reconcile aggregate "equilibrium" behavior in experiments 
with individual non-equilibrium play by coni e cturin g that subjects may use counteracting 

(120071 ) report that some subjects revise their 



behavioral rulesrj For instance 



Selten et al. 



choice if the road of their choice turned out to be congested, while other players stick with 
their choice in that they expect others to switch. Also lBottazzi and Devetagi (120071 ) 

find that there is considerable heterogeneity in players' behavior in their experiments on 
the minority game. They show that it is not the heterogeneity per se which determines 
the players' success in coordinating, rather, it is the interaction between these different 
behavioral rules that players can successfully coordinate on choosing different actions. 
These findings are in line with the predictions of the minority game learning model that 
response modes and their antagonists coevolve in such a way that their actions effectively 
cancel out, thus reconciling aggregate equilibrium behavior and individual non-equilibrium 
play. 

However, it is not fully clear which behavioral rules subjects employ. For instance, 
Selten et al.l (120071 ) are unable to classify 42% of the subjects in terms of the behavioral 



rules they use in their route-choice experiments. This leaves open the possibility that 
subjects use some response modes that may not have an intuitive interpretation and are 
thus not recognized by the experimenters, but that nevertheless perform well as response 
modes and their antagonists coevolve, as predicted by the minority game learning model 
(see Section 13.21 and 14. 3p . A systematic stud y of the different response modes used by 
experimental subjects seems needed. Indeed, IZwick and RapoportI (120021 ) conclude that 
there is a need "to re-orient research on interactive decision making to individual differ- 
ences, identify patterns of behavior shared by subsets of players . . . , and then attempt to 
account for aggregate behavior in terms of the behavior of the clusters of players that form 
these aggregates". 

Finally, the effect of information on players' behavior in such games remains a puz- 
zle. Two dimensions of information have been investigated in the experimental literature. 
Firstly, it has been studied how behavior depends on the information given on other players' 
choices. Players can be provided with information only on the payoff rule and aggregate 
behavior in the past rounds or may be informed additionally of the individual choices of all 





Bottazzi and Devetae 


c 


^007 


). 


Rapoport et al. 


(200(1). 


Selten et al. 


(20071 



Chmura and Pitz ( 2004 ) . 


Erev and RaDODort 


and 


Zwick and RapoportI ( 


2002 


)■ 



( 1998 ). 
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ot her players. If p l ayers learn e.g. according to the sta ndard reinforcement 
of Roth and Erey ( 1995 ). hypothetical reinforcement ( Duffy and Hopkinsl . 



earni ng model 



2005r i. the mi- 



nority game learning model, or if the learning process can be described by the replicator 
dynamic, this should not affect results. 

However, in many expe rimental studies, behayior differs qualitatively depending on the 
information players have. Duffy and Hopkii3 ( 2005 ) reports that behavior becomes less 
random when players are provided with information on the individual choices of other 
players: the hypothesis of randomizing behavior can be rejected for a larger share of the 
players, and subjects seem to display some inertia in their behavior. However, this may 
be due to the fact that the additional information given to players allows them to play 
complicated repeated-game strategies: players may signal their commitment to a certain 
action. While for the market entry game, such a signalling strategy pays off, this is not 
the case in the minority gameo Also, one c an imagine that feelings lik e regret or envy 
play a larger role in the market entry game (lErev and Rapoportl . Il998l ). In that sense, 
experiments on the ni ir iority game provide a cleaner test of learning theory. Nevertheless, 



Bottazzi and Devetagl (120071 ) find that providing players with additional information on 



their opponents' play makes that players switch less often between different actions. In the 
treatment with full information on individual players' actions, players tend to stick more 
often to their last period's action, especially when this action was the minority action. 
Combined with some heterogeneity in players' beliefs, this inertia and "reinforcement" 
effect partly explains players' success at coordinating in the minority game. However, 



Bottazzi and Devetagl show that inertia, reinforcement, and heterogeneity alone are not 



sufficient: players' strategies also coevolve, or self-organize to improve aggregate payoffs, 
as predicted by the minority game learning model. 

A second dimension of information that has been studied in the literatu r e refe rs to the 
salience of information on the recent history of play. iBottazzi and Devetagl (l2007l ) provide 
players with a string of past outcomes of varying length. When players are provided with 
information on play in more rounds than just the previous one, aggregate efficiency is 



20pQj. instance, suppose that k players commit to action a = —1, and k players commit to action a = +1. 
The remaining player will not be deterred from choosing either of those actions by the commitment of other 
players, nor does the commitment of these players guarantee them a positive payoff. A repeated-game 
strategy that does pay off in the minority game is one in which players "take turns" : players alternately 
choose each of the two action s in such a way that each player is in the minority roughly half of the time. 



Indeed, iHelbing et al.l (|2005[ ) find some evidence of such behavior in their experiments on route-choice 
games with small groups, but it is unlikely that players will be able to successfully play according to such 
a repeated-game equilibrium when the number of players is large. 
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significantly improved. They find that providing players with a string of greater length 
allows players to correlate their behavior over a longer time period: when players are 
provided with the outcome of the previous round, there is only a significant relation between 
present and past choices for the first two time lags, whereas such a relation hold for up to 
three time lags when more information is provided. Notably, in a treatment where players 
arc provided with a string of intermediate length and the degree of aggregate efficiency 
is highest, play is characterized by a substantial lack of short-range correlations between 
current and past actions: players seem to exploit the additional information to improve 
their payoffs. 

All together, these experimental studies give some support to the learning model pro- 
posed in the minority game literature. However, the question how information influences 
play in congestion games has still not been satisfactorily answered. It would be inter- 
esting to compare players' behavior under different informational treatments in different 
congestion games. While most learning models make similar predictions for the different 
congestion games discussed here, intuitively, one would expect that information will play 
a different role in these games, as emotions like envy and regret will be more important in 
some games than in others, and also the scope for repeated-game strategies differs across 
games. Such a systematic comparison would allow one to better separate the learning 
effects from possible repeated-game and behavioral effects. 

6 Conclusions 

In this paper, we have given a critical account of the learning model proposed in the 
learning model proposed in the minority game literature, and related it to standard learn- 
ing and evolutionary models in economics, showing that it shares quite a few features with 
these models. Still, the predictions of this learning model are markedly different from the 
predictions from other models. However, these predictions are in line with some exper- 
imental results on the minority game and related games, which cannot be explained by 
other models. 

However, our understanding of learning in such games is far from complete. For in- 
stance, the effect of feedback on play is unclear. An interesting direction for further research 
would be to systematically vary players' information in experiments on different conges- 
tion games such as the minority game and the market entry game, and to compare play 
under the different information treatments and across games. While most learning models 
provide similar predictions for these games, intuitively, one would expect that information 
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may have different effect in these games, as in some games, repeated-game strategies or 
emotions may play a larger role than in others. Such an experiment may help shed light 
on the question which learning model is appropriate in such games. 
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